Romanian Zero Pronoun Distribution: A Comparative Study

نویسندگان

  • Claudiu Mihaila
  • Iustina Ilisei
  • Diana Inkpen
چکیده

Anaphora resolution is still a challenging research field in natural language processing, lacking an algorithm that correctly resolves anaphoric pronouns. Anaphoric zero pronouns pose an even greater challenge, since this category is not lexically realised. Thus, their resolution is conditioned by their prior identification stage. This paper reports on the distribution of zero pronouns in Romanian in various genres: encyclopaedic, legal, literary, and news-wire texts. For this purpose, the RoZP corpus has been created, containing almost 50000 tokens and 800 zero pronouns which are manually annotated. The distribution patterns are compared across genres, and exceptional cases are presented in order to facilitate the methodological process of developing a future zero pronoun identification and resolution algorithm. The evaluation results emphasise that zero pronouns appear frequently in Romanian, and their distribution depends largely on the genre. Additionally, possible features are revealed for their identification, and a search scope for the antecedent has been determined, increasing the chances of correct resolution.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

To Be or Not to Be a Zero Pronoun: a Machine Learning Approach for Romanian

This paper presents a new study on the distribution and identification of zero pronouns in Romanian. A Romanian corpus that includes legal, encyclopaedic, literary, and news texts has been created and manually annotated for zero pronouns. Using a morphological parser for Romanian and machine learning methods, experiments have been performed on the created corpus for the identification of verbs ...

متن کامل

A Comparative Study of Spanish Zero Pronoun Distribution

The aim of this paper is to report the distribution of Spanish zero pronouns in three different genres: legal, encyclopaedic and instructional. The Z-corpora were created for this purpose and a sample of 1043 zero pronouns was annotated. The most salient patterns of distribution are compared for each genre, and some relevant issues concerning the use of zero pronouns are described in relation t...

متن کامل

The Impact of Zero Pronominal Anaphora on Translational Language: a Study on Romanian Newspapers

This study investigates the impact of zero pronominal anaphora for Romanian on a learning model able to distinguish between translated and non-translated texts. Even though the correct understanding of ellipsis from the source language and its mapping into the target language is essential in the translation process, zero pronominal anaphora has been scarcely investigated in the context of trans...

متن کامل

Resolving Romanian Zero Pronouns: A Machine Learning Approach

This paper presents a new study on the distribution, identification, and resolution of zero pronouns in Romanian. A Romanian corpus, including legal, encyclopaedic, literary, and news texts has been created and manually annotated for zero pronouns. Using a morphological parser for Romanian and machine learning methods, experiments were performed on the created corpus for the identification and ...

متن کامل

Zero Pronominal Anaphora Resolution for the Romanian Language

This paper presents a new study on the distribution, identification, and resolution of zero pronouns in Romanian. A Romanian corpus, including legal, encyclopaedic, literary, and news texts has been created and manually annotated for zero pronouns. Using a morphological parser for Romanian and machine learning methods, experiments were performed on the created corpus for the identification and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010